智能论文笔记

DOC-NAD: A Hybrid Deep One-class Classifier for Network Anomaly Detection

Mohanad Sarhan , Gayan Kulatilleke , Wai Weng Lo , Siamak Layeghy , Marius Portmann

分类：机器学习

2022-12-15

Machine Learning (ML) approaches have been used to enhance the detection capabilities of Network Intrusion Detection Systems (NIDSs). Recent work has achieved near-perfect performance by following binary- and multi-class network anomaly detection tasks. Such systems depend on the availability of both (benign and malicious) network data classes during the training phase. However, attack data samples are often challenging to collect in most organisations due to security controls preventing the penetration of known malicious traffic to their networks. Therefore, this paper proposes a Deep One-Class (DOC) classifier for network intrusion detection by only training on benign network data samples. The novel one-class classification architecture consists of a histogram-based deep feed-forward classifier to extract useful network data features and use efficient outlier detection. The DOC classifier has been extensively evaluated using two benchmark NIDS datasets. The results demonstrate its superiority over current state-of-the-art one-class classifiers in terms of detection and false positive rates.

translated by 谷歌翻译

XG-BoT: An Explainable Deep Graph Neural Network for Botnet Detection and Forensics

Wai Weng Lo , Siamak Layeghy , Mohanad Sarhan , Marius Portmann

分类：机器学习

2022-07-19

在本文中，我们提出了XG-Bot，这是一种可解释的深层图神经网络模型，用于僵尸网络淋巴结检测。所提出的模型主要由僵尸网络检测器和自动取证的解释器组成。XG机器人检测器可以有效检测大型网络下的恶意僵尸网络节点。具体而言，它利用与图同构网络的分组可逆残差连接从僵尸网络通信图中学习表达性节点表示。XG机器人中的解释器可以通过突出可疑网络流和相关的僵尸网络节点来执行自动网络取证。我们评估了现实世界中的大规模僵尸网络网络图。总体而言，就评估指标而言，XG机器人能够超越最先进的方法。此外，我们表明XG机器人解释器可以基于自动网络取证的Gnnexplainer生成有用的解释。

translated by 谷歌翻译

Anomal-E: A Self-Supervised Network Intrusion Detection System based on Graph Neural Networks

Evan Caville , Wai Weng Lo , Siamak Layeghy , Marius Portmann

分类：机器学习 | 人工智能

2022-07-14

本文研究了图形神经网络（GNNS）应用程序，以进行自我监督的网络入侵和异常检测。 GNN是一种基于图的数据的深度学习方法，它将图形结构纳入学习以概括图表和输出嵌入。由于网络流量自然基于图，因此GNN非常适合分析和学习网络行为。基于GNN的网络入侵检测系统（NIDSS）的最新实现很大程度上依赖于标记的网络流量，这不仅可以限制输入流量的数量和结构，还可以限制NIDSS的潜力来适应看不见的攻击。为了克服这些限制，我们提出了异常-E，这是GNN的入侵和异常检测方法，该方法在自我监督过程中利用边缘特征和图形拓扑结构。据我们所知，这种方法是第一种成功且实用的方法来进行网络入侵检测，该方法利用网络流动在自我监督，边缘利用GNN中。两个现代基准NIDS数据集的实验结果不仅清楚地显示了使用Anomal-E嵌入而不是原始功能的改进，而且还显示了对野生网络流量检测的潜在异常-E具有的潜在异常功能。

translated by 谷歌翻译

Graph Neural Network-based Android Malware Classification with Jumping Knowledge

Wai Weng Lo , Siamak Layeghy , Mohanad Sarhan , Marcus Gallagher , Marius Portmann

分类：机器学习

2022-01-19

本文提出了一种基于图形神经网络（GNN）的新的Android恶意软件检测方法，并具有跳跃知识（JK）。Android函数呼叫图（FCGS）由一组程序功能及其术间调用组成。因此，本文提出了一种基于GNN的方法，用于通过捕获有意义的心理内呼叫路径模式来检测Android恶意软件的检测方法。此外，采用跳跃知识技术来最大程度地减少过度平滑问题的效果，这在GNN中很常见。该方法已使用两个基准数据集对所提出的方法进行了广泛的评估。结果表明，与关键分类指标相比，与最先进的方法相比，我们的方法的优越性，这证明了GNN在Android恶意软件检测和分类中的潜力。

translated by 谷歌翻译

E-GraphSAGE: A Graph Neural Network based Intrusion Detection System for IoT

Wai Weng Lo , Siamak Layeghy , Mohanad Sarhan , Marcus Gallagher , Marius Portmann

分类：人工智能 | 机器学习

2021-03-30

本文介绍了基于图形神经网络（GNN）的新的网络入侵检测系统（NID）。 GNN是深度神经网络的一个相对较新的子领域，可以利用基于图形数据的固有结构。 NIDS的培训和评估数据通常表示为流记录，其可以自然地以图形格式表示。这建立了探索网络入侵检测GNN的潜在和动力，这是本文的重点。基于机器的基于机器的NIDS的目前的研究只考虑网络流动，而不是考虑其互连的模式。这是检测复杂的物联网网络攻击的关键限制，例如IOT设备推出的DDOS和分布式端口扫描攻击。在本文中，我们提出了一种克服了这种限制的GNN方法，并允许捕获图形的边缘特征以及IOT网络中网络异常检测的拓扑信息。据我们所知，我们的方法是第一次成功，实用，广泛地评估应用图形神经网络对使用流基于流的数据的网络入侵检测问题的方法。我们在最近的四个NIDS基准数据集上进行了广泛的实验评估，表明我们的方法在关键分类指标方面占据了最先进的，这证明了网络入侵检测中GNN的潜力，并提供了进一步研究的动机。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action

Kuan-Chieh Wang , Zhenzhen Weng , Maria Xenochristou , Joao Pedro Araujo , Jeffrey Gu , C. Karen Liu , Serena Yeung

分类：计算机视觉

2022-12-28

The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.

translated by 谷歌翻译

Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos

Wang Zixiao , Weng Junwu , Yuan Chun , Wang Jue

分类：计算机视觉 | 机器学习

2022-12-27

Learning with noisy label (LNL) is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering the properties of videos, such as computational cost and redundant information, is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) A lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category; 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed tru{\bf N}cat{\bf E}-split-contr{\bf A}s{\bf T} (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10\% of it, our method achieves over 0.4 noise detection F1-score and 5\% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80\%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6\%.

translated by 谷歌翻译

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Vikas Verma , Sarthak Mittal , Wai Hoh Tang , Hieu Pham , Juho Kannala , Yoshua Bengio , Arno Solin , Kenji Kawaguchi

分类：机器学习 | 计算机视觉

2022-12-27

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

translated by 谷歌翻译

Exploring the Challenges of Open Domain Multi-Document Summarization

John Giorgi , Luca Soldaini , Bo Wang , Gary Bader , Kyle Lo , Lucy Lu Wang , Arman Cohan

分类：自然语言处理 | 人工智能

2022-12-20

Multi-document summarization (MDS) has traditionally been studied assuming a set of ground-truth topic-related input documents is provided. In practice, the input document set is unlikely to be available a priori and would need to be retrieved based on an information need, a setting we call open-domain MDS. We experiment with current state-of-the-art retrieval and summarization models on several popular MDS datasets extended to the open-domain setting. We find that existing summarizers suffer large reductions in performance when applied as-is to this more realistic task, though training summarizers with retrieved inputs can reduce their sensitivity retrieval errors. To further probe these findings, we conduct perturbation experiments on summarizer inputs to study the impact of different types of document retrieval errors. Based on our results, we provide practical guidelines to help facilitate a shift to open-domain MDS. We release our code and experimental results alongside all data or model artifacts created during our investigation.

translated by 谷歌翻译